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Abstract 

Every government and nation in the world strives to make advancements in 
the area of education because it’s a pivotal element of society. According to 
studies, pupil performance has declined after the nimbus- contagion epidemic 
that disintegrated life in 2020, which emphasizes the need to treat this prob- 
lem more seriously and seek to pinpoint both the causes and the effective treat- 
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questionnaire; ments. The educational system has been impacted in numerous ways. By assay- 
algorithms; ing and assessing scholars’ academic performance while taking a variety of 
outcomes; aspects into the account, the design aims to ameliorate academic, professional, 
treatments 


and university guidance. Scholars will be surveyed using a questionnaire for 
this design, and data from the UCI Machine Learning Repository will also be 
used. Understanding the colorful aspects that affect scholars’ performance and 
prognosticating scholars’ success through the use of colorful machine learning 
algorithms to dissect pupil data and one’s issues The three different orders of 
pupil characteristics include particular traits, academic attributes, and behav- 
ioral traits. The pupils’ literacy achievements are told by their behavioral 
traits. It has been noted that regression and Classification models are con- 
stantly used in prediction. 


1. Introduction 


Education is a crucial aspect of society and govern- 
ments worldwide strive to improve this area. How- 
ever, since the outbreak of the COVID-19 pandemic 
in 2020, student performance has dropped, high- 
lighting the urgent need to identify effective solu- 
tions and factors that affect performance. The edu- 
cational system has been impacted in various ways. 
The goal of this project is to improve academic, 
professional, and university guidance by assessing 
student performance while taking into account var- 
ious factors. The data will be collected through a 
questionnaire-based survey of students and from the 
UCI Machine Learning Repository. Three types of 
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student characteristics will be considered, including 
educational features, behavioral features, and spe- 
cific aspects. It has been observed that prediction is 
generally made using bracket ways and regression. 


2. Literature Survey 


Rajalaxmi R R et al. (R) investigated the use 
of regression models for predicting the academic 
performance of engineering scholars. The study 
compared direct multivariate regression and sup- 
port vector machine algorithms with other methods. 
The findings showed that while the support vector 
machine algorithm had lower sensitivity than other 
algorithms, multiple measures were used to validate 
the models and predict the likelihood of accurate 
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predictions. 


Harikumar Pallathadka et al (Pallathadka) con- 
ducted research on the prediction and classification 
of student performance data using machine learn- 
ing algorithms. The study utilized a decision tree 
and fuzzy genetic algorithm to anticipate academic 
performance based on prior academic results. The 
analysis of student’s talents and interests in rela- 
tion to their performance was also explored using 
the UCI machine learning dataset. (Al-Shehri et al.) 
This analysis could help teachers identify and pro- 
vide targeted support to students who require it, ulti- 
mately leading to improved instruction quality. 


The accuracy of the student’s predictive model 
using behavioral features achieved up to 22.1% 
improvement compared to the results when remov- 
ing such features, and it achieved up to 25.8% 
accuracy improvement using ensemble methods. 
For building a better student performance predic- 
tive model we have to make use of the behav- 
ioral features of the students. The use of ensem- 
ble methods can help us achieve a better predictive 
model. (Tjandra) 


Pranav Dabhade et al (Dabhade et al.) presented a 
paper on data mining in education to predict student 
academic performance using machine learning algo- 
rithms where they used multiple linear regression 
and support vector regression. To evaluate the influ- 
ence of features and attributes on the desired result, 
built-in models of machine learning algorithms were 
selected from sci-kit-learn. Past performance is 
most important for predicting future performance. 
The results obtained show that there is a relation- 
ship between student behavior and academic perfor- 
mance. The research can be extended using neural 
networks and other regression algorithms. The big- 
ger the data set, the better the prediction. The use of 
behavioral elements is also important for predicting 
a student’s future academic performance. (Van Der 
Schaar) 


3. Design 


A training model is a model used to train a machine 
learning algorithm. It consists of sample output data 
and corresponding sets of input data that affect the 
output. (Juan and Gomez-Pulido)The training model 
is used to process the input data using an algorithm 
to correlate the processed output with the sample 
output. 
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Training the 
Dataset 
Model 
Evaluation 


FIGURE 1. Training Model 


FIGURE 2. Flow of a website 


Based on the built training model, we can create 
a webpage that predicts student performance, allow- 
ing us to pay closer attention to weak students, the 
above FIGURE 2 shows the flow chart of the web- 
site. 


3.1. Data Collection: 


Data collection gathers and measures information on 
variables of interest. In our design, we substantially 
collect data for training our model from the UCI 
machine literacy depository and Google form-based 
questionnaire. (Nabil, Seyam, and Abou-Elfetouh) 
FIGURE 1 


3.2. DataPre-processing: 


Data preprocessing is a data mining fashion used to 
transfigure the raw data in a useful and effective for- 
mat. Then we clean the data by removing all the 
null values and homogenizing them to convert it into 
a useful and effective format that will be useful to 
train our model. (Umar) FIGURE 1 


3.3. Feature selection: 


Feature selection is the process of reducing the num- 
ber of input variables when developing a prophetic 
model. FIGURE 1 
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3.4. Training the dataset: 


After a thorough literature check, we’ve con- 
cluded using SVM and KNN bracket models( 
We might change/ include any fresh models if 
needed). (Casillas et al.)FIGURE 1 


3.5. Model Evaluation: 


Model evaluation is the process of using differ- 
ent evaluationcriteria to understand a machine lit- 
eracy model’s performance, as well as its strengths 
andsins. (Schmidt-Thieme) FIGURE 1. 


4. Methodology 


Dataset obtained from UCI Machine Learning 
Repository. The original data set consisted of some 
string values so during the preprocessing we mapped 
the string values to numerical values and this task 
was done by a function named numerical_data. After 
the string values were mapped to the numerical 
values we had to normalize the variable and this 
task was done by a function called feature_scaling. 
The dataset consists of 395 rows x 31 columns. 
After plotting the correlation graph between the stu- 
dent’s status(pass/fail) and the other features in the 
dataset. (Sekeroglu, Dimililer, and Tuncal) (Xu) 


5. Result 


Observations made from the dataset are as follows: 

1)Students’ whose parents are educated have a 
positive impact on the student’s status. 

2) Students who wanted to take up higher educa- 
tion also had a positive impact on the student’s sta- 
tus. 

3)Going out with friends for a longer duration of 
time had a negative impact. 

4)Age and failures also had a negative impact on 
the student’s status. 

Features that had low or no impact on the stu- 
dent’s status are 

1)The geographical location of the students and 
2)Alcohol consumption 

Features that had a higher/positive impact on the 
student’s status are 

1)Spending limited time on going out with 
friends. 

2)Is not in a romantic relationship 

3)Parents are educated 

4)Student has a strong desire for higher studies 

5) Has access to the internet 

6)Is Healthy 
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7)Study for at least 10 hours a week 


precision recall f1i-score support 

8.8 Q.57 Q.57 @.57 30 

1.8 8.85 Q.85 @.85 89 

accuracy @.78 119 
macro avg @.71 Q.71 @.71 119 
weighted avg @.78 8.78 @.78 119 


FIGURE 3. Classification Report of KNN Model 


KNN ROC curve 


true positive 


“T T T 5 
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FIGURE 4. ROC Curve of KNN Model 


KNN is a machine learning algorithm that is used 
for classification and regression tasks. The model’s 
accuracy has increased to 78% as a result of vari- 
ous factors such as data quality and quantity, feature 
selection, hyperparameter tuning, and model com- 
plexity. FIGURE 3, FIGURE 4 


ROC curve: SVM linear kernel 


false negative 


4 —— SVM linear kernel 


+ r r r r r 
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FIGURE 5. ROC Curve of SVM Model 


Support vector machines, or SVMs, are power- 
ful machine learning algorithms that are commonly 
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used for classification and regression tasks. SVM, 
like KNN, is affected by the same variety of fac- 
tors. With the addition of the Optimal C value, 
the model’s accuracy increased to 84%. Our model 
performs better with SVM linear kernel FIGURE 5 
model than with other kernels. 


6. Comparison of Results 


The previous models only included personal 
and academic characteristics, whereas our model 
involves personal, academic, and behavioral charac- 
teristics or traits. 

The addition of behavioral features improved the 
model’s accuracy. This is a common approach in 
machine learning in which you try to improve the 
model’s performance by adding new features that 
capture additional data information. 

It is important to note that the improvement in 
accuracy is not always significant, and it is also 
dependent on the quality of the new features and 
their ability to capture the underlying patterns in the 
data. It is also critical to choose and 

pre-process the new features carefully to avoid 
overfitting or adding noise to the model. 


7. Conclusion 


In conclusion, the student performance and difficul- 
ties prediction machine learning project has shown 
promising results in predicting academic perfor- 
mance and identifying potential challenges for stu- 
dents. By analyzing various features such as atten- 
dance, grades, and demographic information, the 
model was able to make accurate predictions and 
provide insights to support student success. 

However, it’s important to note that machine 
learning models are not perfect and there are limita- 
tions to their predictions. The accuracy of the model 
depends on the quality of the data used for training 
and testing, as well as the features selected for anal- 
ysis. Therefore, it’s important to continuously eval- 
uate and refine the model to improve its accuracy 
and usefulness. 

Overall, the student performance and difficulties 
prediction machine learning project has the potential 
to assist educators and administrators in identifying 
at-risk students and providing targeted interventions 
to support their academic success. 
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